A secure key-exchange protocol with an absence of injective functions 
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The security of neural cryptography is investigated. A key-exchange protocol over a public channel 
is studied where the parties exchanging secret messages use multilayer neural networks which are 
trained by their mutual output bits and synchronize to a time dependent secret key. The weights 
of the networks have integer values between ±L. Recently an algorithm for an eavesdropper which 
could break the key was introduced by Shamir et al. jD]. We show that the synchronization time 
increases with L 2 while the probability to find a successful attacker decreases exponentially with L. 
Hence for large L we find a secure key-exchange protocol which depends neither on number theory 
nor on injective trapdoor functions used in conventional cryptography. 
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The ability to build a secure channel is one of the most 
challenging fields of research in modern communication 
. One of the fundamental tasks of cryptography is to 
generate a key-exchange protocol. Both partners start 
with private keys and transmit - using a public protocol 
- their encrypted private keys which, after some trans- 
formations, leads to a common secret key. A prototypical 
protocol for the generation of a common secret key is the 
Diffie-Hellman key exchange protocol 0] . 

All known secure key-exchange protocols use one-way 
functions, which are usually based on number theory and 
in particular on the difficulty in factorizing a product of 
long prime numbers [^]|| . Typically, N bits - the length 
of the key - are transmitted between the two partners 
and transformed by an injective function to the common 
key. This function usually can be inverted by a secret 
trapdoor. One of the fundamental questions in the theory 
of cryptography is firstly whether it is possible to build 
a secure cryptosystem which does not rely on number 
theory, secondly, whether one can transmit less than N 
bits and thirdly, whether one can generate very long keys 
which can be directly used for one-time stream ciphers 

In our recent paper |1| we presented a novel principle 
of a key-exchange protocol based on a new phenomenon 
which we observed for artificial neural networks. The 
protocol is based on the synchronization of feedforward 
neural networks by mutual learning. It was shown by 
simulations and by the analytical solution of the dynam- 
ics that synchronization is faster than the learning of a 
naive attacker that is trying to reveal the weights of one 
of the parties |^,|| . Our new approach does not rely on 
previous agreement on public information , and the only 
secret of each one of the parties is the initial conditions 
of the weights. The protocol generates permanently new 
keys and can be generalized to include the scenario of 
a key-exchange protocol among more than two partners 
. Hence, we suggest a symmetric key-exchange proto- 
col over a public channel which simplifies the task of key 
management. The parties exchange a finite number of 



bits less than N and can generate very long keys by fast 
calculations. 

This protocol for the given parameters in B (K = L = 
3) was recently shown to be breakable by an ensemble 
of advanced flipping attackers 0]. In such an ensemble, 
there is a probability that a low percentage of the attack- 
ers will find the key. Someone reading all the decrypted 
messages will determine the original plaintext from the 
message which has a meaning. This result raises the ques- 
tion of the existence of a secure key-exchange protocol 
based on the synchronization of neural networks. 

In this Letter we demonstrate that the security of 
our key-exchange protocol against the flipping attack in- 
creases as the synchronization time increases. The mech- 
anism used to vary the synchronization time is the depth 
of the weights, i.e. the number of values for each com- 
ponent of the synaptic weights. The main result in this 
Letter is that with increasing depth the probability of 
an attacker finding the key decreases exponentially with 
the depth. Hence we conjecture that a key-exchange pro- 
tocol exists in the limit where the synchronization time 
diverges. We also present a variant of our original scheme 
which includes a permutation of a fraction of the weights. 

In our original scheme each party of the secure chan- 
nel, A and B, is represented by a two-layered perceptron, 
exemplified here by a parity machine (PM) with K hid- 
den units. More precisely, the size of the input is KN 
and its components are denoted by Xkj, k = 1, 2, K 
and j = 1, N. For simplicity, each input unit takes 
binary values, Xkj = ±1. The K binary hidden units are 
denoted by y±, 7/2, Vk- Our architecture is charac- 
terized by non-overlapping receptive fields (a tree), where 
the weight from the \th input unit to the kth hidden unit 
is denoted by uikj , and the output bit O is the product 
of the state of the hidden units. The weights can take in- 
teger values bounded by \L\, i.e., Wkj can take the values 
—L, -L+l, L. 

The secret information of each of the parties is the 
initial value for the 2KN weights, 
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parties do not know the initial weights of the other party 
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which are used to construct the common secret key. 

Each network is then trained with the output of its 
partner. At each training step a new common public 
input vector (xkj) is needed for both parties. For a given 
input, the output is calculated in the following two steps. 
In the first step, the state of the K hidden units, y A ^ B of 
the two parties, are determined from the corresponding 
fields 

N 

vt ,B = si § n E w t /B x u] w 

In the case of zero field, ^2 w kj Xk i =0, A/B sets the 
hidden unit to 1/ — 1. In the next step the output O a I b is 
determined by the product of the hidden units, A I B = 
yUtJ, ■ The output bit of each party is transmitted 
to its partner. In the event of disagreement, A ^ O b , 
the weights of the parties are updated according to the 
following Hebbian learning rule 

if (0 A ' B vt ,B > 0) then wtj B = W A J B + O a ' b x kj 
if (\wt' B \ > L) then wt' B = sign(^ /S ) L (2) 

Only weights belonging to hidden units which are in the 
same state as their output unit are updated. Note that 
from the knowledge of the output, the internal represen- 
tation of the hidden units is not uniquely determined be- 
cause there is a 2 K ~ 1 fold degeneracy. As a consequence, 
an attacker cannot know which weight vectors are up- 
dated according to equation (2). Nevertheless, although 
parties A and B do not have more information than an 
attacker, they still can synchronize. 

The synchronization time is finite even in the thermo- 
dynamic limit ^|Jj|. For K = L = 3, for instance, the 
synchronization time t av converges to ~ 400 for large 
networks. This observation was recently confirmed by 
an analytical solution of the presented model J5j. Sur- 
prisingly, in the limit of large N one needs to exchange 
only a few hundred bits to obtain agreement between 3N 
components. PJTl[| 

An attacker eavesdropping on the channel knows the 
algorithm as well as the actual mutual outputs, hence he 
knows in which time steps the weights are changed. In 
addition, an attacker knows the input Xkj as well. How- 
ever, the attacker does not know the initial conditions 
of the weights of the parties and as a consequence, even 
for the synchronized state, the internal representations 
of the hidden units of the parties are hidden from the 
attacker. As a result he does not know which are the 
weights participating in the learning step. Note that for 
random inputs all 2 k ~ 1 internal representations appear 
with equal probability at any stage of the dynamical pro- 
cess. The strategy of a naive attacker which has the same 
architecture as the parties is defined as follows B. The 



attacker tries to imitate the moves of one of the parties, 
A for instance. The attacker is trained using its internal 
representation, the input vector and the output bit of A, 
and the training step is performed only if A moves (dis- 
agreement between the parties). Note that the trained 
weights of a naive attacker are only weights belonging 
to hidden units that are in agreement with A . Simula- 
tions as well as analytical solution of the dynamics indi- 
cate that the learning time of a naive attacker is much 
longer than the synchronization time QH- Hence our 
key-exchange protocol is robust against a large ensemble 
of naive attackers. 

Recently, an efficient flipping attack was presented [0 . 
The strategy of a flipping attacker, C is as follows. In the 
event of a disagreement between the parties, A ^ B 
and O c — O a , the attacker moves as for the naive attack 
following its internal presentation, the common input and 
O a . In the case where the parties move but the attacker 
does not agree with A, O a ^ B and O c ^ O a , the 
move consists of the following two steps. In the first step 
the attacker flips the sign of one of its K hidden units 
without altering the weights. The selected hidden unit is 
Kq with the minimal absolute local field 

K = min m {\hg\) (3) 

where is the local field on the mth hidden units of the 
attacker (see eq. (1) for the definition of the local field). 
After flipping one hidden unit the new output of the at- 
tacker agrees with that of A. The learning step is then 
performed with the new internal presentation and with 
the strategy of the naive attacker. The flipping attack 
is based on the strategy that a flipping attacker devel- 
ops some similarity with the parties. This similarity can 
be measured by the fraction of equal weights which is 
greater than 1/(2L + 1), a result for a random attacker, 
or by a positive overlap between the weights of C and 
A g. The minimal change in the weights which pre- 
serves the already produced similarity with A and which 
is also consistent with the current input /output relation 
is most probable by changing the weights of the hidden 
units with the minimal absolute local field. Simulations 
as well as the analytical solution of the dynamics of the 
flipping attackers |l3| indicate that there is a high prob- 
ability that there is a successful attacker among a few 
dozen attackers. By a successful attacker we mean an 
attacker with a learning time smaller than the synchro- 
nization time between the parties. This attacker achieves 
the same weights as for A before the synchronization pro- 
cess terminates. In Fig. 1 the average synchronization 
time, tav, as well as its standard deviation as a function 
of L for K = 3 and N = 10 3 are presented. Results were 
averaged over ~ 10 different runs, where each run is 
characterized by different initial conditions for the parties 
and a different set of inputs. Results indicate that the 
synchronization time increases as L 2 , for L < 0(>/N). 
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This scaling is consistent with the analytical solution of 
ref H where for L = s/N, t av oc N. For L = 0(VN) 
we observe in simulations a crossover to the scaling be- 
havior t av oc y/N L. This crossover explains the deviation 
of t av oc L a ,a = 1.91 < 2 (see Fig. 3), and furthermore 
a is expected to increase with N (see Fig. 4). 



flipping attack (Practically, for L ~ 85 and N > 2 • 10 4 , 
the complexity of an effective flipping attack is greater 
than 2 80 ). 

Finally we note that the complexity of the synchroniza- 
tion process for 1 < L < \/N is 0{L 2 N log(TV)). The 
factor log(iV) is a result of a typical scenario of an expo- 
nential decay of the overlap in the case of discrete weights 
[||. Hence, the complexity for the generation of a large 
common key, N — > oo, scales as O (log TV) operations per 
weight. 
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FIG. 1. The average synchronization time, t av , and its 
standard deviations as a function of L for K = 3 and N = 10 3 . 
The regression fit for the dotted line is ~ 50L 1 ' 91 . 
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FIG. 2. The fraction of successful flipping attackers, P 
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as a function of L for K 
the dotted line is ~ 1.4e" 
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In Figure 2 the fraction of successful flipping attackers, 
PfUpy is presented as a function of L. In order to reduce 
fluctuations in our simulations we define a successful at- 
tacker as one which has 0.98 fraction of correct values 
for the weights at the synchronization time between the 
parties. Fig. 2 indicates that the success rate drops ex- 
ponentially with L. To conclude, for 1 -C L -C \/~N 
the synchronization time diverges polynomially while the 
probability of a successful attacker drops exponentially. 
Hence for large L our construction is robust against the 
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N= 1,000 
N= 100,000 
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FIG. 3. The learning time for a perceptron as a function 
of L and N — 10 3 , 10 5 . The regression power-law fit for 
, 12L 1 ' 77 , 



N = 10 3 , 10 5 



17 L , respectively. 



Let us compare now the complexity of an exhaustive 
attack with the complexity of the flipping attack. For 
each input /output pair there are 4 possible configurations 
of the hidden units. Hence to cover all possible training 
processes over a period t one has to deal with an ensem- 
ble of 4* scenarios. The crucial question is the scaling of 
the minimal necessary period to with L which ensures a 
convergence with the weights of party A. Since one of 
the attackers among 4 to has an identical series of inter- 
nal representations to party A, the problem is reduced to 
calculating the weight vector of a single perceptron. The 
learning time as a function of L for a perceptron attacker 
K = 1 is presented in Fig. 3, indicating that for large 
N, to ~ L 2 , as expected from similar analytically solv- 
able models . Hence the complexity of an exhaustive 
attack scales exponentially with L 2 while for the flipping 
attack the complexity is reduced to scale exponentially 
only with L. 

In the following we show that one can increase the 
security of our key-exchange protocol by the following 
variant of our dynamical rules. The new ingredient is 
a permutation of a fraction / of the weights, and the 
protocol is defined by the following steps. In the case 
where the parties move, we assign for each hidden unit 
a permutation consisting of F = fN pairs. Each pair 
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consists of a random selection of two indices among N of 
the trained hidden unit Jl4|]. The three permutations for 
the three hidden units (which differ from step to step) are 
part of the public protocol. In the case where a hidden 
unit is trained we apply the assigned permutation for this 
hidden unit. Note that the permutations is an ingredient 
that prevents an attack where one may assign for each 
weight (among 37V) a probability equal to one of the 2L + 
1 possible values. During the dynamics one may try to 
sharpen this probability around one of the possible values 
jij. The permutations are responsible for mixing these 
probabilities as a function of time. 

Results indicate that there are two different scaling 
behaviors for t av (L) and Pfu p {L) as a function of the 
total number of permuted pairs, M, during the synchro- 
nization process. As long as M < cf>KN where (f> ~ 1, 
the permutations do not affect the synchronization time, 
t av (L) = AL 2 ; A ~ 60 is independent of the permuta- 
tions (A increases slightly with N and is asymptotically 
expected to scale with log(A T) pi). This scaling behavior 
can be observed for L < y/S^N/ (60/). Hence in order 
to observe the scaling, t av ~ 60L 2 over a decade of L one 
has to choose a large N and a very small F. In Fig. 4 
the average synchronization time, t av , and its standard 
deviations as a function of L are presented for K = 3, 
N = 10 5 and F = 0, 3 (number of permuted pairs is 3 per 
hidden unit). An insignificant deviation from the scaling 
behavior is observed only for L > 32. In the inset of Fig. 
4, similar results are presented for N — 10 3 with F = 3, 
and N = 10 4 with F — 3 and 20. The deviation from the 
scaling behavior is observed for a larger L as we increase 
N or as we decrease F (L < y/3<pN/ (60/)). We also mea- 
sured PfUp(L) L < 10 for N = 10 4 , 10 5 with F = 3 or 
F = 0. We realized that Pfu p is independent of F and it 
decreases exponentially with L. The permutations do not 
affect the exponential drop, Pfu P oc e~ BL , where B ap- 
pears to increase with TV. Note that although the permu- 
tations do not affect t av and Pfu p , the accumulated affect 
of the permutations over all the synchronization process 
is significant. In the event that the flipping attacker does 
not use the permutation, a dramatic drops in Pm p is ob- 
served jl3| . The analysis of the scaling behavior of t av 
and Pfu p in the second regime L > y/3(f>N/ (60/) is be- 
yond our computational ability, where huge fluctuations 
are observed. 

The scaling of Pfu p may be examined against other 
classes of attacks including a genetic attack, a majority 
attack and a flipping attack where the weights of the se- 
lected hidden unit are modified to actually flip the sign 
of the hidden unit M . Our results indicate that all such 
types of attacks are less efficient than the flipping attack 
presented. Hence, for all known attacks neural cryptog- 
raphy is secure in the limit of large values of L. 

We thank Adi Shamir for critical comments on the 
manuscript. 



FIG. 4. The synchronization times, t av , and their standard 
deviations as a function of L for K — 3, N = 10 5 with F = 
(A) and F = 3 (O)- The regression fit for 2 < L < 25 , dotted 
line, is ~ 57.3L 2 02 . Inset: t av as a function of L, N = 10 3 , 
F = 3 (dashed line), N = 10 4 F = 0, 3, 20 (A, Q, +). 
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